Sarcopenia prediction using shear-wave elastography, grayscale ultrasonography, and clinical information with machine learning fusion techniques: feature-level fusion vs. score-level fusion

This study aimed to develop and evaluate a sarcopenia prediction model by fusing numerical features from shear-wave elastography (SWE) and gray-scale ultrasonography (GSU) examinations, using the rectus femoris muscle (RF) and categorical/numerical features related to clinical information. Both cohorts (development, 70 healthy subjects; evaluation, 81 patients) underwent ultrasonography (SWE and GSU) and computed tomography. Sarcopenia was determined using skeletal muscle index calculated from the computed tomography. Clinical and ultrasonography measurements were used to predict sarcopenia based on a linear regression model with the least absolute shrinkage and selection operator (LASSO) regularization. Furthermore, clinical and ultrasonography features were combined at the feature and score levels to improve sarcopenia prediction performance. The accuracies of LASSO were 70.57 ± 5.00–81.54 ± 4.83 (clinical) and 69.00 ± 4.52–69.73 ± 5.47 (ultrasonography). Feature-level fusion of clinical and ultrasonography (accuracy, 70.29 ± 6.63 and 83.55 ± 4.32) showed similar performance with clinical features. Score-level fusion by AdaBoost showed the best performance (accuracy, 73.43 ± 6.57–83.17 ± 5.51) in the development and evaluation cohorts, respectively. This study might suggest the potential of machine learning fusion techniques to enhance the accuracy of sarcopenia prediction models and improve clinical decision-making in patients with sarcopenia.


Correlation between SMI and USG measurements
Figure 1 shows the correlation between the SMI and USG measurements.There was a moderate to strong positive correlation between SMI and RF thickness in both cohorts (r = 0.523-0.675,p < 0.001) (Fig. 1A).Additionally, a moderate to strong positive correlation was observed between the SMI and CSA of RF in both cohorts (r = 0.575-0.662,p < 0.001) (Fig. 1B).There was moderate negative correlation between SMI and SCF thickness of both cohorts (r = − 0.448 to − 0.393, p < 0.001) (Fig. 1C).Furthermore, the correlation between SMI and SWV of RF was strongly positive in the development cohort (r = 0.652, p < 0.001) and weakly positive in the evaluation cohort (r = 0.330, p = 0.003) (Fig. 1D).www.nature.com/scientificreports/ Combining the clinical and USG features at the score level using LSE and AdaBoost showed improved prediction performance than that of the clinical and USG features (LSE: 82.53% accuracy, 60.92% sensitivity, 88.70% specificity, 68.83% PPV, 89.13 NPV, and 78.35% AUC, AdaBoost: 83.17% accuracy, 59.97% sensitivity, 89.80% specificity, 67.47% PPV, 88.57% NPV, and 74.78% AUC), where degraded prediction performance was observed for the score-level fusion using the rest methodologies.All prediction accuracy differences after fusion compared to the clinical and USG features were statistically significant (p < 0.001 for all cases).Figure 2 shows the test ROC curves obtained from the experiments using the development and evaluation datasets.While most methods showed similar AUC performances, except for score-level fusion based on SVM and AdaBoost, scorelevel fusion based on AAC minimization achieved the most reliable AUC performance in the development set (Fig. 2A).For the evaluation set, the clinical features showed the most reliable AUC performance, where the feature-level fusion and score-level fusion based on SR, LSE, and AAC showed comparable or slightly better performance than the clinical features in a partial range (Fig. 2B).

Statistical significance tests
To validate the differences in performance before and after combining the clinical and USG features, we used the accuracy values for the paired t-test.Comparing the accuracy performance of the clinical features with the feature-and score-level fusion for the development set, the score-level fusion based on the sum rule (p < 0.001) outperformed the clinical features, while the score-level fusion based on LSE (p = 0.006) showed significantly worse performance than that based on the clinical features.In contrast, the feature-level fusion (p = 0.836), and score-level fusion based on AAC optimization (p = 0.059), RF (p = 0.128), SVM (p = 0.913), and AdaBoost (p = 0.413) showed no statistically significant differences.For the USG features, score-level fusion using the sum rule (p = 0.006) and AdaBoost (p = 0.046) showed a significantly better performance than the USG features, whereas the rest showed no statistically significant performance difference.
Using the evaluation set, the feature-level fusion and score-level fusion based on LSE and AdaBoost showed a significantly better accuracy performance than the clinical features (p < 0.001).Score-level fusion based on the sum rule, AAC optimization, RF, and SVM showed significantly worse accuracy performance than the clinical features (p < 0.001).Combining the clinical and USG features at the feature and score levels showed a significantly better accuracy performance (p < 0.001 for all methods) than that based on USG features.

Discussion
This study investigated the usefulness of USG measurements and clinical characteristics to predict the presence of sarcopenia.Significant differences were observed in clinical characteristics (sex and height) and GSU measurement (thickness of SCF) between the "sarcopenia" and "non-sarcopenia" groups in both cohorts.Clinical features were better predictors of sarcopenia than USG features.The fusion of clinical and USG features at the feature level demonstrated accuracy performance similar to that of the clinical features in predicting sarcopenia for the development cohort and better performance for the evaluation cohort.Score-level fusion based on SR, SVM, and AdaBoost exhibited improved accuracy performance for sarcopenia prediction for the development cohort, while score-level fusion based on LSE and AdaBoost showed improved sarcopenia prediction accuracy for the evaluation cohort.
The development cohort comprises individuals with a BMI below 25 kg/m 2 and a median age of 45.7 years.They are devoid of potential diseases that capable of influencing muscle degeneration or wastage.On the other hands, the evaluation cohort comprises a more clinically diverse group of individuals with conditions that extend beyond the typical healthy range encountered in the clinical practice, potentially impacting muscle wasting.This www.nature.com/scientificreports/diversity might contribute the better accuracy in assessing sarcopenia within the evaluation cohort compared to the development cohort.Sarcopenia is defined as the loss of muscle mass and function (strength) 11 .Muscle function depends on the contractility of muscle, which affected by muscle quality, not just by quantity (mass) 8 .USG has been used to evaluate the muscle mass and its quality without radiation exposure at a relatively low cost [12][13][14][15] .Although there is no consensus on the definition of muscle quality in assessing sarcopenia, the subjective degree of increased echo intensity (EI) on GSU has been used to evaluate muscle quality 16 .Increased echogenicity on USG reflects increased intramuscular adipose tissue, inflammation, or fibrosis and causes decreased muscle strength and altered stiffness 17 .However, Pillen et al. reported a relatively low reproducibility of subjective EI evaluation compared to the quantification of muscle EI 18 .
Recent studies have applied USG, including the SWE value of muscle, to evaluate sarcopenia and have suggested the potential of SWE values as an effective tool for clinical practice which can reflect the contractibility (function) of muscle 4,5,8 .In this study, we developed a diagnostic model for sarcopenia prediction using clinical information and USG parameters, including SWE measurements (SWV).However, unlike Chen et al., who evaluated the elderly patients with type 2 diabetes 4 , there was no statistically significant difference in the SWV values of RFM between the "sarcopenia" and "non-sarcopenia" group in either the development or evaluation cohort of the current study.This difference probably arises from the age range of the participants and the presence or absence of any comorbidities that induce muscle wasting.Nevertheless, the diagnostic performance of the current study (using the numerical value of USG) (AUC, 78.47%) was similar to a previous study (AUC, 74-84%), which utilized SWE and GSU for sarcopenia prediction using deep convolutional neural network (DCNN) learning directly from the "image itself " 13 .
Shear wave elastography (SWE), which is based on shear waves that propagate through tissues, can measure the elasticity and stiffness of tissues in the body.The quantified elasticity coefficients of SWE are represented as color-coded images or specific values from the ROI drawings that are difficult to quantify.In this study, we used both GSU and SWE for USG measurement acquisition and proposed a combination of USG measurements and clinical information at the feature and score levels.Fusing information at the feature level may increase the diversity of the acquired data.However, features obtained from different modalities may not be compatible in terms of size and discriminability.In addition, the increased feature dimensionality requires additional training data.In practice, fusing information at the score level is often preferred because of its ease of use in combining information from different modalities.Although some information loss may occur, score-level fusion is advantageous in terms of applicability.The score-level fusion approach is widely used in various classification tasks such as biometrics 19,20 It is a suitable candidate for medical imaging applications, including ultrasound image analysis, because it provides a straightforward and practical method of combining information from different modalities.In this study, we showed that fusing clinical and USG features delivered the best prediction performance at the score level, demonstrating reliable application capability in the evaluation cohort.
This study has several limitations.First, the number of development and evaluation cohorts was relatively small for learning-based methods.However, this was a pioneering study to quantify features from multiple modalities, and it demonstrated a similar level of diagnostic performance to previous studies.Second, we defined sarcopenia based on CT images without evaluating physical performance.Further studies incorporating functional tests such as patient gait speed or handgrip strength are required to confirm these results.Finally, there was a relatively small effort to handle the data imbalance in the current study, although we adopted several classifiers.In future studies, we plan to improve the generalization capability of the sarcopenia prediction model by acquiring more datasets and adopting data augmentation and advanced learning methods such as one-shot learning to resolve small data sizes and data imbalance concerns.
In conclusion, we have successfully developed and assessed a sarcopenia prediction model.The score-level fusion approach showed a better prediction performance than the feature-level fusion approach considering both cohorts.This study highlights the potential of machine learning fusion techniques to enhance the accuracy of sarcopenia prediction models and improve clinical decision-making in patients with sarcopenia.

Development and evaluation data sets
The development cohort was prospectively included between June 2019 and February 2020 using the following criteria: (i) 20 years ≤ age ≤ 69 years; (ii) no history of any cancer, diabetes, neuromuscular disorder, other systemic disease that might cause muscle wasting (including renal disorder and cardiopulmonary disorder); (iii) healthy body mass index (BMI [18.5-24.9kg/m 2 ]); (iv) no history of trauma of the right lower extremity; (v) no history of lumbar spine operation, and (vi) not pregnant.Seventy participants were included and none were excluded (Fig. 3A).All 70 participants underwent USG (both GSU and SWE) at the right RF and CT at the L3 level on the same day.Additionally, the evaluation cohort was retrospectively selected from those who underwent both RF USG (both GSU and SWE) and CT, including the L3 level, within 1 month between December 2018 and May 2019.The exclusion criteria for the evaluation cohort were as follows: (i) postoperative status of the lumbar spine (n = 2) and (ii) CT scans acquired over 1 month with USG (n = 14).Finally, 81 patients were included in the evaluation cohort (Fig. 3B).Clinical data including age, sex, height, weight, and BMI were collected from both cohorts.This retrospective study was approved by the institutional review board of Inje University Haeundae Paik Hospital (Approval No. 2023-05-024).The requirement for informed consent was waived.This study complied with the Declaration of Helsinki and the Health Insurance Portability and Accountability Act (HIPAA).

Ultrasonography evaluation and analyses
All subjects underwent right RF USG (LOGIQ E9; GE Healthcare, Wauwatosa, WI, USA) using a 9 L-D linear array transducer (GE Healthcare) by a musculoskeletal fellowship-trained radiologist (with 5 years of experience, JY).The subjects were asked to lie supine with a relaxed neural ankle position.The mid-portion of the right RF was evaluated using USG, with a copious amount of gel placed on the skin to minimize external compression by the transducer.On gray-scale ultrasonography (GSU), the thicknesses of the RF and overlying subcutaneous fat (SCF), as well as the cross-sectional area (CSA) of the RF were measured (Fig. 4A).After GSU, three consecutive SWE images of RF were acquired at the same location.Three circular regions of interest (ROIs) were drawn per SWE image (Fig. 4B).The average mean shear wave velocity (SWV) (m/s) in three consecutive images was calculated for statistical analyses.

Assessing sarcopenia on CT
CT studies were conducted at a single center using two multidetector-row CT scans on an axial plane, including the L3 vertebral level: a 64-slice system (Discovery CT 750 HD, GE Healthcare) and a 128-slice system (Definition AS + , Siemens Healthineers, Forchheim, Germany).CT images were acquired at 120 kVp and 132 mAs for both systems and reconstructed with a 5-mm slice thickness without intervals for all CT scans.A single slice of the L3 inferior endplate level of the image was selected, and Digital Imaging and Communications in Medicine format of the image data were evaluated using the Asan J software (available at http:// datas haring.aim-aicro.com/ morph ometry).A musculoskeletal radiologist (J.Y.), who was trained to manually trace the outline of the abdominal and back muscles (paraspinals, psoas, quadratus lumborum, transverse abdominal, abdominal internal/external oblique, and rectus abdominis), performed all CT image analyses, and the skeletal muscle area (SMA) was obtained using predetermined thresholds for the Hounsfield units (HU) on CT.The skeletal muscle index (SMI, cm 2 /m 2 ) was calculated using the following formula to normalize values across patient heights: Sarcopenia was defined based on the Korean National Health and Nutrition Examination Study (KNHANES) 8 , and the cutoff value for sarcopenia was different in both sexes (sarcopenia ≤ 49 cm 2 /m 2 for men and ≤ 31 cm 2 / m 2 for women).

Data preprocessing and representation
In our study, we used two types of data for sarcopenia prediction: clinical and USG features.The clinical features of sarcopenia prediction include numerical and categorical data representing the clinical characteristics of patients.These features included age, sex, height, weight, and BMI.We defined the USG features as GSU and SWE assessments, including the thickness of the mid-RF, overlying SCF, CSA of the mid-RF, and average mean SWV.All features with numerical values were rescaled using the minimum and maximum values of the training set.For binary categorical data, we assigned zero and one to the two categories.Subsequently, the clinical and USG feature vectors for sarcopenia prediction were denoted as x C = x age , x sex , x height , x weight , x BMI T and x U = [x mRFT , x mRFOFT , x mRFCSA , x SWV ] T , respectively.Here, x mRFT , x mRFOFT , x mRFCSA , and x SWV are the mid-RF thickness, mid-RF overlying SCF thickness, mid-RF CSA, and average SWV, respectively.

Sarcopenia prediction model
In this study, we proposed fusing clinical and USG information to predict sarcopenia.To verify the sarcopenia prediction performance enhancement, we compared the performance before and after combining clinical and USG features.For sarcopenia prediction using each clinical and USG feature separately, we adopted a linear regression model with least absolute shrinkage and selection operator (LASSO) regularization 10 for each modality (clinical and USG features).For simplicity, we refer to the adopted model as the LASSO.While a ridge regression model penalizes the sum of squares of the weight coefficients to prevent overfitting 21 , LASSO estimates the weight coefficients by shrinking the sum of their absolute values to less than a fixed value.Consequently, some of the weight coefficients are forced to be zero, resulting in feature selection capability.Because of this advantage, LASSO has been applied to various regression and classification applications, including medicine [22][23][24] .
Information fusion approaches at various stages of pattern classification systems have been thoroughly investigated in several publications 19,[25][26][27] .To combine the clinical and USG features, we propose fusing them at both the feature and score levels.For the feature-level fusion approach, we generated a feature vector, x F = x T C , x T U T which concatenated the clinical and USG feature vectors.Subsequently, we adopt LASSO for classification as in sarcopenia prediction using each feature modality.To fuse the clinical and USG features at the score level, a new vector concatenating the LASSO model outputs for each feature modality was produced, where the vector was denoted as x S = [x SC , x SU ] T .In the representation, x SC and x SU are the normalized LASSO model outputs obtained using x C and x U , respectively.Then, we adopted the sum-rule 28 , least squares estimation (LSE) 29 , area above the receiver operating characteristics (ROC) curve (AAC) optimization 20 , random forest (RF) 30 , support vector machine (SVM) 31 , and adaptive boosting (AdaBoost) 32 as classification techniques.The score-level fusion approach can adopt various types of classifiers in the process.Therefore, we adopted several well-known classifiers for performance comparison purposes and diversity.We only provide input representations (feature vectors) because classification techniques such as LASSO, LSE, AAC optimization, RF, SVM, and AdaBoost are generic algorithms.We note that LSE and AAC search for a deterministic solution by minimizing the sum of squared errors and approximated area above a receiver operating characteristic curve, respectively, and SVM searches for a solution in an iterative manner by maximizing margin.Unlike these single classifiers, random forest and www.nature.com/scientificreports/AdaBoost are ensemble classifiers based on bagging and boosting.We refer the readers to their original works 20,[29][30][31][32] for a very detailed exploration.Figure 5 presents an overview of the proposed sarcopenia prediction model.

Statistical analyses
The Shapiro-Wilk test was used to determine whether the data were normally distributed.Paired and independent t-tests and chi-square tests were performed to compare the clinical parameters, SMI, and USG parameters between the "sarcopenia" and "non-sarcopenia" groups and between "development" and "evaluation" cohorts.Pearson's correlation was used to evaluate the correlation between SMI and USG measurement parameters.We also evaluated the sarcopenia prediction performance of the model in terms of accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and the area under the receiver operating characteristic (ROC) curve (AUC).The optimal threshold values for the accuracy, sensitivity, specificity, PPV, and NPV were selected according to the Youden index.The threshold value is set at max(r SEN + r SPE − 1) , where r SEN and r SPE denote sensitivity and specificity, respectively.For each performance indicator, the average and standard deviation values were reported using the results from leave-one-out cross-validation tests for the development set.For each training set, stratified tenfold validation was performed for hyper-parameter tuning.Similarly, for the evaluation set, we reported the average and standard deviation values by applying the trained models on the evaluation set.To verify whether the performance enhancement by fusion was statistically significant, we performed a paired t-test 26

Figure 1 .
Figure 1.Correlation between skeletal muscle index (SMI) and ultrasonographic measurements.(A) Correlation between SMI and the right rectus femoris muscle (RF) thickness.(B) Correlation between SMI and the cross-sectional area (CSA) of RF. (C) Correlation between SMI and the subcutaneous fat (SCF) thickness.(D) Correlation between SMI and shear-wave velocity (SWV) of RF.

Figure 2 .
Figure 2. ROC curves from the experiments using the development set and evaluation set.(A) ROC curves from the leave-one-out cross-validation tests using the development set.(B) ROC curves from applying the trained models on the evaluation set. https://doi.org/10.1038/s41598-024-52614-2www.nature.com/scientificreports/

Figure 3 .
Figure 3. Flow diagram for development and evaluation data sets.

Figure 4 .
Figure 4.The representative image of ultrasonography measurements of mid rectus femoris muscle (RF).(A) On grayscale ultrasonography, the thickness of RF (white dashed double arrow), cross-sectional area of RF (black dashed line), and the thickness of subcutaneous fat layer (SCF) overlying RF (white double arrow) were measured.(B) On shear-wave elastography, the mean of shear wave velocity (SWV, m/sec) was measured.

Figure 5 .
Figure 5.An overview of the proposed sarcopenia prediction model.LASSO is adopted for clinical and USG features and fusing them at the feature level.Additionally, score-level fusion exploits several classifiers, such as SR, LSE, AAC minimization, RF, SVM, and AdaBoost.

Table 1 .
Comparison of the characteristics of development and evaluation cohorts.*BMI, body mass index; SMI, skeletal muscle index; RF, rectus femoris muscle; SCF, subcutaneous fat layer; CSA, cross sectional area.
a Student T-test.b chi-square test.*Comparison between development and evaluation cohorts.